Skip to content

Fix flaky TestStorageProviderUploadDownload integration test#21

Closed
jrepp wants to merge 4 commits intomainfrom
fix/flaky-storage-integration-test
Closed

Fix flaky TestStorageProviderUploadDownload integration test#21
jrepp wants to merge 4 commits intomainfrom
fix/flaky-storage-integration-test

Conversation

@jrepp
Copy link
Owner

@jrepp jrepp commented Nov 8, 2025

Summary

Fixes the flaky TestStorageProviderUploadDownload integration test by improving Docker Compose setup, adding retry logic, and enhancing test logging.

Problem

The integration test was failing due to port conflicts when Docker containers from previous test runs hadn't fully cleaned up. Port 5556 (used by the dex service) was being held by gvproxy even after docker compose down, causing subsequent test runs to fail with "address already in use" errors.

Test Results Before Fix:

  • 0/20 tests passed (100% failure rate)
  • All failures due to port 5556 conflicts

Solution

1. Improved Docker Compose Setup

  • Added retry logic (3 attempts) with exponential backoff
  • Implemented 2-second wait after cleanup for port release
  • Enhanced logging at each setup step

2. Enhanced Test Logging

  • Added detailed timing information for each operation
  • Added object verification step before download
  • Added 100ms stabilization wait after upload
  • Enhanced error messages with detailed diagnostics

3. Test Infrastructure

  • Added run_flaky_test.sh utility script for debugging flaky tests
  • Added test_logs/ to .gitignore

Test Results After Fix

Verification:

  • 5/5 consecutive test runs passed (100% success rate)
  • Average test duration: ~44 seconds
  • No port conflicts observed

Changes

  • testing/integration_test.go - Retry logic and cleanup improvements
  • testing/storage_integration_test.go - Comprehensive logging
  • run_flaky_test.sh - New debugging utility
  • .gitignore - Exclude test artifacts

Testing

Run the test multiple times to verify stability:

for i in 1 2 3 4 5; do go test -v -run "^TestStorageProviderUploadDownload$" -timeout 90s; done

Or use the included script:

./run_flaky_test.sh

jrepp added 4 commits November 8, 2025 13:46
Exclude the test_logs directory created by test debugging scripts
to prevent test artifacts from being committed to the repository.
Enhanced the Docker Compose setup for integration tests to handle
transient failures and port conflicts:

- Add retry logic (3 attempts) for starting services
- Implement exponential backoff between retries
- Add 2-second wait after cleanup for port release
- Improve logging at each setup step

This fixes flaky test failures caused by port conflicts when
containers from previous test runs haven't fully cleaned up.
Enhanced test with detailed logging and verification steps:

- Add timing information for each operation
- Add object verification step before download
- Add 100ms stabilization wait after upload
- Add detailed error messages for data mismatches
- Log connection details and test progress

This logging helps diagnose flaky test behavior and provides
better visibility into test execution for debugging purposes.
Add run_flaky_test.sh utility script that runs integration tests
multiple times to help identify and debug flaky test behavior.

Features:
- Configurable number of test runs (default: 20)
- Detailed logging of each run
- Summary report with pass/fail statistics
- Saves logs for failed tests for analysis

This tool was used to diagnose and fix the flaky
TestStorageProviderUploadDownload test.
@jrepp
Copy link
Owner Author

jrepp commented Nov 8, 2025

Superseded by #22 which integrates the stability testing methodology directly into the test suite rather than using an external bash script.

@jrepp jrepp closed this Nov 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant